In this session, we will use the Black Friday Data available in Kaggle to study how to make the following graphical displays.
Here is a list of common arguments:
In this session, we will use the Black Friday Data available in Kaggle to study how to make the following graphical displays.
Here is a list of common arguments:
In order to understand the customer purchases behavior against various products of different categories, the retail company “ABC Private Limited”, in United Kingdom, shared purchase summary of various customers for selected high volume products from last month. The data contain the following variables.
Rows: 550,068
Columns: 12
$ User_ID <dbl> 1000001, 1000001, 1000001, 1000001, 1000002~
$ Product_ID <chr> "P00069042", "P00248942", "P00087842", "P00~
$ Gender <chr> "F", "F", "F", "F", "M", "M", "M", "M", "M"~
$ Age <chr> "0-17", "0-17", "0-17", "0-17", "55+", "26-~
$ Occupation <dbl> 10, 10, 10, 10, 16, 15, 7, 7, 7, 20, 20, 20~
$ City_Category <chr> "A", "A", "A", "A", "C", "A", "B", "B", "B"~
$ Stay_In_Current_City_Years <chr> "2", "2", "2", "2", "4+", "3", "2", "2", "2~
$ Marital_Status <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0~
$ Product_Category_1 <dbl> 3, 1, 12, 12, 8, 1, 1, 1, 1, 8, 5, 8, 8, 1,~
$ Product_Category_2 <dbl> NA, 6, NA, 14, NA, 2, 8, 15, 16, NA, 11, NA~
$ Product_Category_3 <dbl> NA, 14, NA, NA, NA, NA, 17, NA, NA, NA, NA,~
$ Purchase <dbl> 8370, 15200, 1422, 1057, 7969, 15227, 19215~
Bar chart is a graphical display good for the general audience. Here, we study the distribution of Age Group of the company’s customers who purchased their products on Black Friday.
Usage: barplot(height, …)
A bar chart can be horizontal or vertical. Using the argument col, we can assign a color for bars. The argument main could be used to change the title of the figure. We can use RGB color code to assign colors.
Note: The margin of a figure could be set using the par() function. The order of the setting is c(bottom, left, top, right).
Similarly, we can use pie chart to study the distribution of the city category.
Usage: pie(height, …)
Tip: Use color palette to choose colors (Google search: color scheme generator).
Histogram is used when we want to study the distribution of a quantitative variable. Here we study the distribution of customer purchase amount.
Usage: hist(x, …)
Here, we talk about another graphical display that can be used to study the distribution of a quantitative variable: box and whisker plot (boxplot).
Usage: boxplot(x, …) or boxplot(formula, …)
In general, a boxplot is used When we want to compare the distributions of several quantitative variables. In the following we study the distribution of customer purchase amount among different age groups.
When we want to study the relationship of two quantitative variables, a scatterplot can be used. Since this data set doesn’t have another quantitative variable, we will use the built-in data mtcars in R. Then we study the relationship of miles per gallon against the weight of vehicles.
Since the Black Friday Data are not time series data, it is not appropriate to use a line plot. In the following code chunk, we create a data frame using the forecasted highest temperatures from July 13 to July 22 in 2022 (The Weather Channel).